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I .  Introduction 

Several  quite  powerful  methods  are  available  for  analyzing  asymptotic  prop¬ 
erties  of  many  kinds  of  stochastic  approximations  with  gain  sequences  which  are 
either  constants,  or  tend  to  zero.  Methods  for  the  first  type  of  gain  sequence  - 
examples  of  which  abound  -  are  much  less  well  known.  In  this  paper,  we  develop 
a  method  for  such  a  problem,  based  on  "stability",  "averaging"  and  "diffusion 
approximations  via  weak  convergence  theory"  -  ideas  which  have  served  very  well 
in  many  other  areas.  The  techniques  have  great  power  and  applicability.  The 
general  idea  will  be  outlined  and  a  simple  example  dealt  with  in  detail.  Despite 
the  simplicity  of  the  example,  it  well  illustrates  the  general  approach,  the  kind 
of  calculations  which  need  to  be  done  and  the  type  of  results  which  can  be 
expected.  The  example  is  used  simply  as  a  vehicle  for  explaining  the  main  ideas. 
Nevertheless,  only  the  surface  of  a  large  subject  will  be  touched.  There  is 
much  work  on  the  problem  (when  the  gain  sequence  tends  to  zero)  of  the  asymptotic 
properties  of  stochastic  approximation  via  the  study  of  the  stability  of  an  assoc¬ 
iated  ordinary  differential  equation  tl),  12].  The  local  asymptotic  behavior 
(near  the  limit  points) ,  on  the  other  hand,  is  obtained  via  the  study  of  an  assoc¬ 
iated  stochastic  differential  equation  (as  in  rate  of  convergence  studies  [1] , 

[3],  [4]).  Here  similar  intentions  are  pursued  for  process  with  constant  (but 
small)  gains,  state-dependent  noise,  and  perhaps  discontinuous  forcing  functions. 

We  are  concerned  with  asymptotic  properties  (n  -*•  <»,  then  e  -*■  0)  of  a  sub¬ 
class  of  vector  stochastic  difference  equations  of  the  form 


(1.1) 


+  eh  +  /e  g  (Y^,5^)  +  o(e) , 

n+1  n  e  n  ^n  ^e  n  n 

AIR  F'CiiC.S  QFFI'TS  OF 

NOTICE  D?  -?.v;s?arTAI;  to' DDC  (APSC) 

Y^  ^  R  =  Euclidean  r-space,  auDrcvr'^  "  '  -  ’  revxev/ad  and  is 

"  Piatrltut:.'p  -  (7b) 

A  - -  w  wU* 

•  Dm  £aj«wAwiL 
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2. 


where  and  are  not  necessarily  continuous  everywhere  and  (5^}  is  a  sequence 
of  correlated  (and  perhaps  {Y^} -dependent)  random  variables,  for  each  e  >  0. 
Equations  such  as  (1.1)  occur  very  frequently  in  applications  in  stochastic  con¬ 
trol,  communication  and  automata  theory.  Often  the  or  h^  are  indicator 

£ 

functions;  l.e.,  the  iterate  Y^  moves  "left"  or  "right"  by  e,  depending  only  on 
whether  some  event  or  other  occurred. 

"niere  is  a  vast  (stochastic  approximation)  literature  on  (1.1)  when  e  is 
replaced  by  a  sequence  0.  But,  very  frequently  in  implementations  of 

stochastic  approximation,  is  either  held  constant  or  else  e^le>0asn->-“, 
owing  to  the  desire  to  trade  changes  or  for  the  purpose  of  improving  robustness. 
Consider  a  particular  case  of  (1.1),  a  scalar  Robbins-Monro  procedure  of  the  form 

(1.2)  “  c  sign ()c(X®) +5^1, 

n+1  n  n  n 

where  has  a  symmetric  distribution,  are  not  necessarily  independent  (and 

n  n 

might  even  depend  on  the  iterates),  )e(*)  is  continuous,  and  there  is  a  0  such 

that  )c(x)  >  0  for  x  >  9,  and  k(x)  <  0  for  x  <  0.  In  particular,  suppose  that 

for  X  ^  0  (resp.,  x  _<  0)  ,  lc(x)  is  bounded  below  (above,  resp.)  by  an  increasing 

function  which  has  a  non-zero  slope  near  the  origin.  We  might  want  to  prove  that 
P 

-*•  0  as  n  -»■  <*>  and  e  ^  0,  or  even  to  get  a  good  idea  of  the  statistical  struc- 
n 

ture  of  the  tail  of  the  sequence  {X^-0},  Define  =  (X^-0)//g.  Something  akin 

to  a  rate  of  convergence  can  be  obtained  by  studying  the  asymptotic  properties 

of  {U^}  -  which,  as  ve'll  see,  brings  us  back  to  a  process  of  the  form  (1.1)  with 
n 

a  /e  term  included. 

There  are  numerous  other  applications  of  (1.1)  -  particularly  to  systems 
whose  dynamics  are  determined  by  "logical"  criteria  -  where  the  movement  is 
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determined  partly  by  the  satisfaction  of  a  "logical"  criterion.  Such  systems 
have  not  received  much  attention,  important  though  they  are.  One  very  clever 
approach  and  some  nice  applications  appear  in  [5] ,  [6] .  Although  that  method  and 
ours  both  exploit  "averaging"  phenomena,  our  method  is  different.  We  are  not 
confined  to  finite  intervals  [n:  en_<Tl  for  some  arbitrary  T  (indeed,  it  is  the 
"tail"  which  is  of  most  interest  here) ,  and  [5]  and  [6]  did  not  explicitly  deal 
with  a  /e  factor  in  (1.1).  The  factor  always  occurs  when  a  local  analysis 
(involving  U^)  is  done.  When  this  factor  is  present  the  limiting  determining 
equation  is  a  stochastic  rather  than  ordinary  differential  equation.  We  concen- 
trate  on  the  asymptotic  part  of  or  for  small  £,  often  the  part  of 

greatest  interest.  One  rather  direct  but  useful  method  for  the  asymptotic  problem 
in  in  [4] . 

In  practice  e  is  often  small.  But,  generally,  it  is  very  hard  to  get  infor¬ 
mation  on  what  happens  when  e  is  not  small,  just  as  in  stochastic  approximation 
it  is  hard  to  get  information  on  the  theoretical  behavior  when  n  is  not  large. 

A  main  question  is  how  well  the  theory  for  small  e  predicts  what  happens  in  other 
cases.  Simulations  on  continuous  parameter  problems  which  resemble  these  in 
certain  respects  indicate  that  the  prediction  is  good  for  many  cases  when  the 
parameter  e  is  in  a  "normal"  range.  In  some  cases,  the  asymptotic  behavior  is 
better  than  predicted.  Generally,  the  closeness  of  the  prediction  to  the  behavior 
actually  observed  seems  to  depend  heavily  on  the  form  of  the  nonlinearities,  and 
on  the  correlation  structure  of  the  noise,  and  generalizations  are  hard  to  ma)ce. 

By  concentrating  on  large  n,  we  are  effectively  concentrating  on  what  happens  after 
the  "transient"  period  is  over. 

The  techniques  used  here  come  from  references  17],  [8],  [11],  which  concern  the 
general  problem  (1.1)  (or  continuous  time  analogues)  under  various  sets  of 
assumptions.  The  emphasis  in  [7]  is  on  the  case  where  g^,  h^  are  smooth,  but 
an  outline  is  given  of  the  method  for  the  more  general  case.  Reference  [8] 
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treats  two  problems  in  great  detail,  one  arising  in  an  automata  approach  to  tele¬ 
phone  traffic  routing,  and  the  other  a  recursive  quantizer  in  communication 
theory.  Our  purpose  here  is  to  describe  the  general  method,  citing  results  from 
[7],  [8]  and  avoiding  duplication  of  proofs  wherever  convenient.  As  an  illustra¬ 
tion  of  the  power  and  usefulness  of  the  method,  a  detailed  analysis  of  (1.2) 

(for  both  the  tail  of  {X  }  eind  {U  })  will  be  given.  The  development  will  illustrate 

n  n 

the  usefulness  of  the  approach  for  other  problems.  The  case  (1.2)  is  simple  (but 
not  trivial),  but  a  very  similar  method  of  analysis  is  used  in  more  general  cases. 


Outline  of  the  paper.  The  first  important  result  (Theorem  2)  concerns  the 

tightness  of  {u^,  large  n,  small  e}.  By  this  we  mean  that  there  is  an  >  0, 

whose  value  is  not  important,  such  that  for  each  e  £  Eq,  there  is  an  integer 

N  <  «B  such  that 
e 

(1.3)  lim  sup  P{Iu^l  ^  K}  =  0. 

K-H»  Ef^e^ 

n>N^ 

I.e.,  {U^,  large  n,  small  e}  is  bounded  in  probability.  Such  a  result  makes  pos¬ 
sible  a  det.ailed  asymptotic  analysis  of  {U^},  since  it  implies  that  the  "tails" 
of  uniformly  close  to  9  in  a  specific  statistical  sense.  To  simplify 

the  development  most  details  are  for  the  special  case  (1.2) .  More  general  cases 
can  readily  be  dealt  with  in  a  very  similar  way,  at  the  expense  of  a  somewhat 
heavier  notation,  but  the  simple  case  illustrates  the  main  ideas  and  methods. 

Next,  we  define  a  continuous  parameter  process  U^(’).  The  {n^}  will  always 
satisfy  (1.3).  Let  {»^}  denote  any  sequence  of  integers  satisfying  n^  2. 

n..  “  as  e  -►  0.  Their  values  will  either  be  stated  when  needed  or  will  be 

e 
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£  £  £ 

unimportant.  Define  the  process  U  (t)  =  on  [ie,ie+e).  Define  y  (•)  by 

£  £  £  e  ^ 

Y  (0)  =  Yq  and  Y  (t)  =  on  [en,en+e)  (refer  to  (1.1)).  The  general  result  is 

that  U  (•)  converges  (under  suitable  conditions)  weakly  in  D  [0,®)  to  a  Gauss- 

Markov  diffusion  U(*)  which  satisfies  an  equation  of  the  form 

(1.4)  dU  =  -GUdt  +  adB, 

where  B(*)  is  a  standard  Browninan  motion  and  G  >  0.  If  0^  -*•  ®  fast  enough  as 

£  ->■  0,  then  the  limit  U(*)  is  stationary.  All  terms,  conditions  and  parameters 

will  be  defined  below,  but  first,  let  us  examine  the  limit  (1.4). 

For  small  t,  the  processes  {x  ,U  }  (or  {y  },  for  the  case  (1.1))  have  a 
^  n  n  n 

(perhaps  long)  transient  period,  before  their  distributions  "settle  down"  - 
£ 

especially  if  (Xq-0)  is  large  compared  to  £.  We  are  concerned  only  with  the 

asymptotic  part  -  after  the  so-called  transient  period  is  over;  i.e.  when 

0(1).  We  can  get  the  variance  of  the  asymptotic  part  from  (1.4),  since  'v 

/e  U^+9.  Also,  (1.4)  gives  the  local  correlation  structure  of  the  asymptotic  part 
n 

of  the  process.  Weak  convergence  methods  are  a  very  natural  and  convenient  tool 
for  getting  our  results.  The  details  of  the  derivation  of  (1.4)  will  be  given 
for  the  model  (1.2) .  Very  similar  methods  can  be  used  for  more  complex  problems 
[7],  [8].  Next,  some  comments  and  definitions  concerned  with  weak  convergence 
theory  will  be  given.  Then  we  comment  on  the  so-called  martingale  problem  of 
Strook  and  Varadhan  [9],  which  provides  a  characterization  of  the  desired  limit 
process  which  is  convenient  from  the  point  of  view  of  simplifying  the  proof  of 
showing  that  it  actually  is  the  limit.  Section  II  contains  the  main  background 
theorems.  For  (1.4),  tightness  of  {U^,  large  n,  small  e}  is  proved  in  Section 
III,  and  in  Section  IV  we  show  how  to  get  (1.4)  for  the  case  of  model  (1.2) . 


A  note  on  weak  convergence  theory.  Only  a  few  comments  will  be  made.  For 

full  details  see  [10],  or  [1],  Chapter  3,  for  a  brief  summary.  D^IO,®)  denotes 
T 

the  space  of  R  -valued  functions  on  [0,®)  which  are  right  continuous  and  have 
left-hand  limits.  The  "process"  U^(*)  is  treated  as  a  random  variable  defined 
on  the  Scunple  space  D  [0,®)  and  induces  a  measure  on  it  which  we  denote  by 
(a  useful  topology,  called  the  "Skorokhod"  topology,  is  used  on  D  ;  see  [10]). 

{U  (•)}  is  said  to  be  tight  iff  for  each  6  >  0  there  is  a  compact  set  G  D  [0,®) 

such  that  P^{K^}  ^  1-6,  all  e.  U^(*)  is  said  to  converge  weakly  to  U(-)  if  U(-) 
has  paths  in  D  [0,®) ,  and  induces  a  measure  P  on  it,  and  for  each  real-valued 
continuous  function  F(*)  on  D^[0,®),  /F(v)dP^(v)  /F(v)dP(v)  as  e  ->  0.  The 
basic  result  is;  ^  {U  (•)}  is  tight  on  D  [0,®),  then  each  subsequence  contains 
a  further  subsequence  which  converges  weakly  to  some  process  with  paths  in 
D  [0,®) .  Our  job  here  is  to  characterize  the  limit  process  and  to  show  that  it 
does  not  depend  on  the  subsequence.  Thus  weak  convergence  is  a  substantial 
extension  of  convergence  in  distribution.  It  is  a  tool  that  is  extremely  useful 
in  many  areas  of  applied  probability  where  limit  or  approximation  problems  are 
of  concern.  Criteria  for  tightness  and  weak  convergence  are  often  given  in 
terms  of  the  multivariate  distributions  of  the  processes  {U^CO}.  See  Section  II. 


A  note  on  the  martingale  problem.  This  problem  arises  because  in  the  weak 
convergence  analysis  it  is  convenient  to  characterize  the  limit  U(*),  whatever 
it  may  be,  by  showing  that  it  solves  a  certain  set  of  equations  which  are  known 
as  the  martingale  problem.  Then  we  show  that  the  U(-)  of  (1.4)  is  the  only 
solution  to  that  martingale  problem.  Let  x{')  be  the  solution  to  the  stochastic 
differential  equation  (SDE) 

(1.5)  dx  =  b(x,t)''+  a(x,t)dB,  B(*I  =  standard  Brownian  motion  in  R  , 
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and  set  a(x)cr'(x)/2  =  a(x)  .  Suppose  that  b(*,-)  and  a(-,*)  are  continuous  in 

(x,t)  and  satisfy  a  uniform  Lipschitz  and  linear  growth  condition  (in  x) .  Then 

it  is  well  known  that  the  SDE  (1.5)  has  a  unique  solution  in  the  It6  sense, 
t 

Since  ja{x  )dB  is  a  martingale,  for  any  smooth  bounded  function  f(’),  It6's 
0  ®  ® 

lemma  implies  that 

t 

(1.6)  f(x(t),t)  -f(x(0),0)  -  j(A  +  |^)f(x(s) ,s)ds  5  M^(t) 

0 

is  a  martingale  for  each  initial  condition  x(0)  =  x,  where 

(1.7)  A  .  I  b^(K,t)  5^  .  I  a  (x,t) 

1  1  1,3  13 

=  differential  generator  of  (1.5). 

1C 

Let  y(*)  denote  the  generic  element  of  D  [0,®) ,  let  denote  the  contin¬ 
uous  functions  on  x  [0,«)  with  compact  support  and  leti^'^  denote  the  subclass 
whose  mixed  a  t-derivatives  and  6  x-derivatives  are  continuous.  For  fG  ^ ,  define 

t 

(1.8)  »^CtI  »  f(y(t),t)  -  f(y(0),0)  -  (A  +  |^)  f  (y  (s)  ,  s)  ds , 

0 

where  A  is  an  operator  of  the  form  (1.7)  -  but  not  necessarily  satisfying  the 
Lipschitz  and  growth  conditions.  If  for  each  x,  there  is  a  measure  on  D  (0,®) 
such  that  P  fy{0>»x>  •  1  and  M  <•)  is  a  P  -martingale  for  each  f  G5^'^,  then  {P  }  ^ 
said  to  solve  the  martingale  problem  (of  Strook  and  Varadhan  [9]). 
unique  for  each  x  (as  it  is  xinder  the  Lipschitz  and  growth  condition  cited  above)  , 
then  the  (P^^)  induce  a  Markov  process  on  the  sample  space  D^[0,®).  Also,  for 


8. 


each  X,  there  is  a  B(')  such  that  under  P^,  the  process  satisfies  (1.5)  with 
initial  condition  x. 

The  martingale  formulation  of  SDEs  is  very  useful  in  convergence  studies. 

The  basic  background  theorems  for  our  case  [7],  [11]  are  proved  by  first  showing  tight- 
g  _  ^2  3 

ness  of  {U  (•)}  and  then  showing  that  M^(*)  is  a  martingale  for  each  f  E  ' 

when  any  weak  limit  U(')  is  substituted  for  y(‘)  and  is  replaced  by  the  measure 
% 

of  U(*)  and  A  is  the  operator  (1.7)  (in  our  case  this  specializes  to  the 
operator  of  (1.4)).  Below,  an  assumption  concerning  uniqueness  of  the  solution 
of  the  martingale  problem  will  be  made.  This  is  needed,  for  otherwise  the  weak 
limits  of  {U^(*)}  might  not  be  unique. 


Truncation.  For  mathematical  as  well  as  practical  reasons,  it  is  helpful 
to  work  with  a  truncated  Th®  particular  technical  difficulty  introduced 

in  the  untruncated  case  will  be  discussed  after  the  proof  of  Theorem  2.  The 
truncation  introduced  now  alters  the  basic  problem  (1.2)  and  has  nothing  to  do 
with  the  "technical"  truncation  )  introduced  in  the  first  paragraph  of  Sec¬ 

tion  II.  Suppose  that  we  know  real  numbers  x  ,x  such  that  0  G  (x. ,x  ) .  Let 
the  bar  '  denote  truncation.  Then  we  replace  (1.2)  by 


(1.9) 


(X^  -  e  sign[k(X^)+c'^J) 
n  n  n 


Equation  (1.9)  is  the  actual  algorithm  whose  asymptotic  properties  are  to  be 

studied.  There  are  continuous  functions  a  (•),  b  (•)  such  that  a  (x)  =  b  (x)  = 
-  e  E  e  e 

£  in  (Xjj^+e,x^-e]  and  both  functions  have  values  in  [0,e]  and  are  infinitely 
differentiable  functions  of  x  except  possU>ly  at  the  points  x^^+e,  x^-e,  and 


they  are  such  that 


9. 


(1.10)  X 


n+1 


X  -  a^(X  )  l{k(X  )+C  >0}  +  b^(X  )  l{k(X  )+C  <0}, 
nen  nn—  En  nn 


where  !{•}  is  the  indicator  function. 


II.  Weak  Convergence  Theorems 

Notation  and  a  comment  on  tightness .  The  theorem  below,  taken  from  [7], 

€  £ 

assumes  tightness  of  {y  (•)}  (or  {U  (•);).  Such  tightness  is  not  hard  to  get 
under  reasonable  conditions  by  Theorem  2  of  [7]  if  {y  (•)}  (or  {U  (•)})  is 
bounded.  This  will  be  further  commented  on  below.  We  can  bound  the  processes 
by  a  truncation  device  which,  since  it  is  used  only  as  a  ^chnical  tool,  does 
not  lose  us  any  generality,  and  it  will  now  be  explained.  Loosely  speaking, 
if  the  truncated  processes  defined  below  exhibit  a  suitable  weak  convergence, 
then  so  do  the  {U  (•)}  and  {Y  (•)}  as  originally  defined.  Define  S_  = 

N 

{x:  |x|_<N},  and  let  ^^^(x)  be  a  function  with  values  in  [0,1],  equal  to  unity  in 

S^,  equal  to  zero  out  of  infinitely  differentiable.  Define  Y^'  (and 

for  (1.2),  u^'^)  by 
n 


(2.1) 


Y^,”  =  Y^'”  +  [eh  (Y^'^,C^) 
n+1  n  e  n  n 


+  g  (Y^'^,d)  +  o(E)  ]b  (Y^'^)  , 
e  n  n  N  n 


(2.2)  -  /e  sign[k(X^'”)+^^)b  (U^'*^)  ,  X^'^  =  /e  +  0 

n+1  n  ’n  nNn  n  n 


Let  Y^'*^(*)  and  U^'^(‘)  denote  the  corresponding  interpolations  of  {y^'*^}  and 

n 

We  use  U^'^(O)  =  uf,  if  luf  I  <  N,  and  set  it  equal  to  zero  otherwise, 
n  N  N  — 

N  e  e 

Similarly  for  Y^'  (0).  Owing  to  the  tightness  of  {u^},  this  causes  no  problem. 


We  will  see  below  that  (2.2)  is  consistent  with  (1.9) 


(1.10)  in  that  the 


weak  limits  of  U^(‘),  where  U^(")  is  obtained  from  (1.9)  -  (1.10),  are  the  same 


10. 


as  those  which  Theorem  1  below  gives  us  for  (2.2)  ,  as  e  ->•<),  then  N 

Suppose  for  the  moment  that  for  each  N,  or  converges 

N  N 

weakly  to  U  (•)»  a  diffusion  process  (with  differential  operator  denoted  by  A  ) 

whose  part  up  to  first  escape  from  equals  the  part  of  U(')  up  to  first  escape 

from  S„.  Then  if  the  solution  to  the  martingale  problem  for  operator  A  (A  being  the 

differential  generator  of  U(*))  is  unique,  U^(-)  U(0  weakly.  Let  denote 

expectation  conditioned  on  i^n,  i<n}  or  on  i<^n,  C^,i‘^n}, 

depending  on  the  case  (2.1)  or  (2.2).  is  similarly  defined  when  the  superscript 

N  is  absent. 

The  following  is  an  adaptation  of  Theorem  1  of  [7]  to  our  case.  It  is 
stated  for  the  general  case  (1,1),  (2.1). 


N 

Theorem  1.  Let  the  differential  generators  A  and  A  have  continuous  coeffic¬ 
ients,  which  are  equal  in  for  each  N.  Let  the  solution  to  the  martingale 
problem  corresponding  to  operator  A  be  unique  on  D  [O,”) ,  for  each  x  G  R  ,  For 

each  N  and  f ( • , • )  E  5?,  a  dense  set  (sup  norm)  in  ,  let  there  be  a  sequence 
c  N 

{f  '  (•)}  of  random  functions  satisfying  the  following  conditions.  Each  is 
constant  on  each  [ne,ne+t.)  interval,  at  ne  it  is  measurable  with  respect  to  the 
p-algebra  induced  by  r  j£n,  j<n}  and  for  each  N  and  c 


(2.3)  sup  E|f^'^(en)  |  +  sup  —  E  |  '^f^ (en+e) -f  ^ (cn)  |  < 

n,e  n,e 

(2.4)  E I  f ^ '^ ( en) -f  ( en)  I  -^0  as  e  -*•  0  and  cn  =  t, 

E  f  (en+e)-f  (en) 

(2.5)  e|— ; - (—  f  A  )f(Y  '  ,en)  I  ->■  0  as  e  -*■  0  and  en  =  t. 

G  N  r 

Then  if  {Y  '  ( • )  ,  is  tight  on  D  (O,®)  for  each  N,  where  e^  doesn't  depend 


11. 


on  N,  and  Y^(0)  converges  in  distribution  to  U(0)  ^  e  ->  0,  we  have 

Y^(*)  -►UC*)  weakly,  the  unique  solution  to  the  martingale  problem  for  operator 

A  with  initial  condition  U(0) . 

t  N 

The  main  burden  of  proof  is  in  finding  the  functions  {f  '  (•)},  and  the 
method  for  this  will  be  developed  in  Theorem  3.  In  our  case,  the  can 

be  shown  to  be  tight  via  the  method  of  (71,  Theorem  2,  which  also  makes  use  of 
the  functions  {f*''^(-)}. 

Define  the  operator  A  by  A  f  (en)  =  —  [f  (En+E)-f  (En)]. 

A^'*^  has  the  character  of  an  infinitesimal  operator.  Considering  the  f  as 

"test  functions",  the  aim  of  the  theorem  is  to  find  functions  f  '  which  are 

close  to  the  test  functions  and  such  that  the  action  of  A  '  on  f  '  is  close  to 
N  3 

the  action  of  (A  +  on  the  test  function  f,  for  each  N. 

III.  Tightness  of  {U^,  small  e,  large  n} 

Assumptions .  Henceforth  we  stick  to  problem  (1.9)  or  (1.10).  Since  we 
treat  the  case  where  the  can  depend  on  the  some  information  on  the 

nature  of  the  dependence  must  be  provided.  If  were  a  sequence  of  indepen¬ 

dent  random  variables,  then  the  assumptions  and  development  would  be  much  simpler. 
But  it  is  worthwhile  to  do  the  general  case,  since  various  forms  of  it  occur 
frequently.  We  assume  that  there  is  an  auxiliary  process  such  that 

®  Markov  process  with  a  homogeneous  transition  function,  and  that 
C  £  G  € 

C  is  a  function  of  ^  ,  X  ,  If  X  were  held  fixed  at  a  value  x  for  all  time, 
n  n  n  n 

then  we  have  a  Markov  process  which  we  denote  by  {^^(x)}  and  whose  transition 

probabilities  are  defined  by  the  marginals  of  that  of  given  by 

P{C  (x)6c|c  ,  (x)  ,x}  =  P{5^Gc|C^  I  “  5  ,(x),X*'  ,=x}.  Now  a  function  C  (x) 

n  n-l  n  n-l  n-i  n-i  n 
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£ 

can  be  defined,  where  ^  (x)  is  obtained  from  (5  (x)  ,x)  in  the  seime  way  that  C 

n  n  n 

was  obtained  from  the  pair  (C  ,X  ) .  Define  the  "partial"  i-step  transition 

n  n 

fianction  for  {^^(x)}  by 

(3.1)  P(C  .  (x)  ,JI,b1x)  =  p{C  .  .  (x)G  b||.  (x)  ,x}. 

J  I"*'*'  J 

The  conditions  introduced  below  will  be  on  the  rate  of  convergence  of  certain  Z- 
step  transition  functions  to  their  limits  as  J.  -*•  »,  and  on  the  smoothness  of  the 
transition  functions .  Some  smoothness  is  required  in  order  to  "average  out"  the 
discontinuity  in  the  sign  function.  The  symbol  K(x)  will  be  used  to  denote  the  set 
t-k(x) ,  and  M  denotes  a  constant  whose  value  may  change  from  usage  to  usage. 

We  use  the  following  conditions,  some  in  Theorem  2  and  some  in  Theorem  3. 

(Al)  For  each  x,  there  is  a  unique  measure  P(*|x)  such  that 
00 

I  lP(C,il,K(x)  |x)  -  P(K(x)|x)|  <  M 
£=0 

uniformly  in  C  and  in  xG  [x,,x  J.  P(*|x)  would  normally  be  the  marginal 
of  the  stationary  measure  of  {C^^Cx)}. 

(A2)  For  each  x,  P( *  j  x)  has  a  density  p( * ] x)  which  is  symmetric  about  i  =  0. 

Define  g(x)  =  /sign [k(x)+C]p(5 |x)dC •  Let  g(9)  =0.  g(’)  is  differentiable 

at  X  =  9  with  ^0  and  there  is  a  non-negative,  non-decreasing  func¬ 

tion  q(')  such  that  g(x)  ^  q(x)  for  x  ^  0  and  g(x)  £  -q(x)  for  x  £  0  and 
q(x)  ^  0  ^  X  Q .  k(')  is  continuously  differentiable. 

{A3)  P([y,<»)lx)  and  [y,”)  |x)  are  continuously  differentiable  in  x,y,  the 

continuity  being  uniform  in  £  >  0,  C  and  in  x,y  in  bounded  sets.  Also  (the 
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subscripts  x,  y  denote  the  derivatives) 

00 

I  |p^(L«,K(x)  |x)  -  P  (K(x)(x)|  <  M, 

£=1 

uniformly  in  5  and  x  G  • 

A  I  ^  . 

(A4)  P{5^(x)  6  is  continuous  in  x  in  some  neighborhood  of  9, 

A 

uniformly  in  C  and  j^(x)  . 

(A5)  P(5»l#  [-bfbjjx)  -*■  0  as  b  -►  0  uniformly  in  5  and  in  x  in  some  neighborhood 

of  0. 


Let  P  denote  the  regular  conditional  distribution  of  {?  ,  .(0),  j>0}  con- 
n  ’  n+]  — 

ditioned  on  I  , (0)  *  C  ,,  the  actual  sample  value  at  time  n-1.  Let  E  denote 
n-l  n-1  n 

the  corresponding  regular  conditional  expectation.  We  need  convergence  of  the 
conditional  expectations  of  certain  functions.  In  particular, 


(A6)  There  are  functions  q(m)  which  we  write  as  (which  define  the  expectation 
0 

operator  E  ) 


q(m)  =  Ensign  5q(0)  sign  =  E®  sign  sign 

such  that 

E®  sign  sign  "*■  as  i  -»•  ” 

for  all  ,  and  n  >  0.'  Also  y“  ,  lq{m)  |  <  ®  and 
—  n-1  -  — ■ —  Tn=i 

00  00 

y  y  |e”  sign  5.(0)  sign  -  E  sign  ?.(0)  sign  Cj^{9)  |  f.  M 

j-n  £»j+l  ^  J 

for  all  and  n  >  0 . 


Define 
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eg  (x)  =  [a  (x) l{k(x)+5>0}  -  b  (x)  l{k(x)+5<0}J  p(5|x)dC. 
e  G  —  £ 


Note  that  g^(")  has  the  properties  ascribed  to  g(*)  in  (A2) . 

In  Theorems  2  and  3,  all  0(*)  and  o(-)  are  uniform  (in  the  O(-),  o(-) 
property)  in  all  variables  other  than  their  argument. 


Theorem  2.  Assume  (Al)  -  (A3)  .  There  is  a  sequence  of  integers  ->■  »»  ^  e  ->•  0 
and  an  >  0  such  that  the  double  sequence  '  ^—^0^  is  tight. 

Proof.  Part  1 .  The  method  is  based  on  combination  of  a  Liapunov  function  and 

2 

an  averaging  technique.  Define  V(x)  =  (x-6)  .  Note  that 


g^(x)  =  -s^(x)P(K(x)  |x)  +  b^(x)  (l-P(K{x)  |x)) 


and 

(3.2)  V(x^^,)  -  V(X^)  =  V  (x'^)E^(X*'^,-X^)  +  O(g^), 

n  n+1  n  x  n  n  n+1  n 

(3.3)  E^(X^^-X®)  =  -a  (X^)P(I^  , , 1 ,K{X^) | X^)  +b  (X^) (1-P(C^  , , 1 ,K(X^) | X^) ) . 

n  n+1  n  e  n  n-1  n  '  n  g  n  n-1  n  '  n 

For  small  e, 

'  E^(X^-0)  [-a  (X^)l{k(x^)+C^0}  +  b  (x'^)l{k(X^)+5^<0}] 
nn  Gn  nn—  Gn  n  n 

<  eE^(x^-e)  [-I{k(x^)+C^>0}  +  l{k(X*^)+c'^<0}) . 

~  n  n  n  n—  n  n 


Thus 
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i  |x')  1  *  ouh 


Now,  we  introduce  a  perturbation  Vj^(n)  to  the  Liapuiiov  function.  It  is 
used  as  a  technical  device  to  allow  us  to  effectively  replace  the  P  which  appear 
in  (3.3)  by  the  "stationary"  probabilities  introduced  in  (Al) .  Define  V^(n)  by 


V^(n)  =  -2e  I  [P(i^_;^,j-n+l,K(X^)  |X^) 

j=n  ” 


P(K(xS  |x^)  ]  =  S^. 

n  ’  n 


g  £ 

Vj^(n)  is  well  defined  and  0(e)  by  (Al)  .  V^(n)  is  introduced  in  order  to  average 

out  the  noise  in  (3.2).  Using  the  definition  of  g(x)  ,  we  have 


X  X  X  n  II  n—j.  n  n 


where 


-  V  (X^)[P(i^  ,j-n+l,K(X^) |xj)  -  P(K(X^) |x^)J}. 

All  nx  nn  nn 


We  show  that  |t^|  =  O(e^)  . 

First,  we  siitplify  T^.  Note  that  replacing  X^  ,  in  V  (X^  ,)  by  X® 

n+1  X  n+1  ■'  n 

2 

only  alters  the  svim  by  0(e  ) .  By  making  this  replacement,  writing  2V  (X^)  = 

X  n 
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0(e)  and  using  the  Markov  property  (3.5)*,  we  have  the  equivalent  form  (3.6): 


(3.5)  P(|^  ,,  j-n+1,  K(X^)lx^)  =  E^P(cS  j-n,  K(X^)  |x^) 

n-i  n  n  n  n  n  n 

00 

(3.6)  -t'  .  0<e)  I  e'(IP(S^  3-n,  K(<.i)  ix‘,i)  -  P«(x'^i)  ) 

i=n+i 

-  [P(C^,  j-n,  K(X^)lx^)  -  P(K(X^)|x  ))}. 
n  n  n  n  n 


G  C  £ 

By  (A3)  and  the  law  of  the  mean,  (3.6)  equals  (writing  6X  =  X  ,-X  ) 

n  n+1  n 

1 

00  . 

0(£)E^(X^^  -X^)  I  ds(P  (i^, j-n,K(x) |x)  -P  (K(x)|x)) 
n  n+l  n  .  -  x  n  x 

]=n+l  J 

where  x  =  X^  +  s6x^  is  used  in  evaluating  the  argument  of  the  integral.  But  the 
last  expression  is  0(£^)  by  (A3)  and  the  fact  that  “  0(e)  . 

Part  2.  Define  V^(n)  =  V(X^)  +  vf(n)  and  note  that  vf (n)  =  0(e) .  Sxun- 
-  n  1  1 

marizing  the  calculations  in  Part  1  yields 


(3.7)  EV(n+l)-V^(n)  <  -  eV  (X^)g  (X^)  +  O(e^)  , 

n  ”•  X  n  G  n 

By  (A2)  and  the  fact  that  X^  G  there  is  a  y  >  0  such  that 

(3.8)  EV(n+l)-V^(n)  <  -ey  V(X^)  +  o(e^)  . 

n  —  n 


♦Recall  that  {L(X^), 


j>,n. 


C  ,(X  )=5  , }  is  a  Markov  process  with  initial  condi- 

n-l  n  n-l 
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£  C 

The  V(X  )  on  the  right  side  of  (3.8)  can  be  replaced  by  V  (n)  without  violating 
n 

€  £  2 

the  inequality.  Consequently,  EV  (n)  ^  (l~ey)EV  (n-1)  +  0(e  ) ,  which  implies 

£  £ 
that  EV  (n)  =  0(e)  for  large  n;  i.e.,  that  there  is  a  <  e  such  that  V  (n)  = 

0(e)  for  n  ^  .  since  V^(n)  =  0(e),  EV(X^)  =  0(e)  for  n  ^  also,  and  the 

theorem  is  proved.  Q.E.D. 

Note  on  the  untruncated  {x  }  (x,  =  x..  =  “>)  .  If  we  used  (1.2)  rather 

1^  X. 

than  (1.9),  then  (3.7)  would  still  hold,  and  we 

would  still  have  V^(n)  =  0(e)  .  But,  since  (3.8)  would  not  hold,  we  have  not 
resolved  the  problem  of  showing  that  (3.7)  implies  the  existence  of  such 

that  {U^,  n^N^,  small  e}  is  tight. 


IV.  The  Limit  Theorem 


Define 


=  1  +  2  E®  sign  Cq(9)  sign  ^.(0)  . 


Theorem  3.  Let  be  any  sequence  such  that  n^  —  (Al)  -  (A6) , 

{U  ( •)  ,  e  small,  U  (0)  =  }  converges  weakly  to  the  diffusion  U(  *)  defined  by 

e 


(4.1)  dU  =  -g^(9)Udt  +  odB. 


Let  o^-N^  “  ^  then  the  weak  limit  is  the  stationary 


solution  of  (4 .1)  . 


18. 


,  e 

Proof.  Part  1.  The  main  work  of  the  proof  is  in  constructing  the  f  and  veri¬ 
fying  (2.3)  -  (2.5),  where  the  operator  A  is  the  differential  generator  of  (4.1). 
Then  if  {U^(*),  e  small}  is  tight  (where  U^(0)  =  ),  the  first  conclusion  will 

follow.  The  tightness  proof  will  follow  readily  from  Theorem  2  of  [7]  or  [11].  The 
assertion  concerning  stationarity  is  not  hard  to  get,  but  the  proof  is  omitted 
owing  to  lack  of  space.  Since  (U^  }  is  tight,  any  subsequence  contains  a  further 
subsequence  which  converges  weakly  (i.e.,  in  distribution).  All  limits  will  be 
of  the  form  (4.1)  and  can  differ  only  in  the  initial  condition  U(0).  Also,  any 
tight  sequence  {U  (0) }  contains  a  subsequence  which  converges  weakly  to  some 
random  variable.  Thus,  in  view  of  Theorem  2,  we  can,  without  loss  of  generality, 
assume  that  {u^  }  converges  weakly  to  some  random  variable  U(0) .  Noting  that 

<  N,  we  have  G  (x.+e,x  -e],  and  0  E  [x„+e,x  -e]  for  small  e. 

'  n  —  n  r  u  r  u 

Suppose  that  e  is  small  enough  for  the  last  sentence  to  hold.  Then  a^(X^'^)  = 

b  (X^'^)  =  e.  So  the  {U^'^}  for  the  truncated  problem  (1.9)  -  (1.10)  are  actually 
e  n  n 

given  by  (2.2)  for  small  e,  and  we  use  (2.2)  henceforth . 

^2  3 

Part  2.  Fix  f(*,*)E  =  ^,  which  is  dense  in  5^q>  as  required  by 

Theorem  1.  We  drop  the  superscripts  N  for  the  sake  of  notational  simplicity,  but 

we  are  actually  working  with  (U^ ' )  ,X^'^(  • )  ,E^ '*^)  and  not  with 
- * - -  n  n  n  - 

(U^,...)  henceforth.  We  have 
n  - 


(4.2)  E^f(U^^,  ,ne+e)-f(U^,ne)  =  ef^(U^,nE)  +  o(e)  +  f  (u'^,ne)E'^(U^_^,-U^) 

n  n+1  n  t  n  u  n  n  n+1  n 


+|f  (U^,ne)b^(U^) , 

2  uu  n  N  n 


where 


)  [1  -  2P(5^  ,l,K(e+/E  u^)  (0  +  /e  U^)  J  . 

n~x  n  n 


Actually,  since  U^'”(0)  =  is  the  initial  condition  for  U^'*^(-),  all  indices  n 

e 

should  be  n  +n  and  the  time  argument  ne  should  be  ne-n  e.  But  we  "shift"  to  the 
e  c 

origin  for  notational  simplicity. 


19. 


The  third  term  of  (4.2)  must  be  "averaged  out",  and  this  requirement  determines 

the  form  of  the  f^.  We  will  look  for  f^(*)  of  the  form  f^(ne)  =  f(u^,ne)  +  f^(ne) 

-  -  n  0 

+  f^(ne).  For  our  choices,  the  f^  will  all  be  OCie)  and  (2.3)  -  (2.5)  will  hold. 

Define 

00 

f^(ne)  =  2b  (U^)f  (U^,ne)  I  [P(C^  ,  ,  j-n+l,K(X^)  | X^)  -  P (K(X^)  | X^)  ] . 

0  N  n  u  n  .  n-1  n  n  n  n 

3=n 

By  (Al) ,  fQ(ne)  is  well  defined.  We  have 

(4.3)  ,f^(ne+e)-f^(ne)  =  o(e)  -  third  term  of  (4.2) 

n+i  0  0 


+  /eb„(U  )f  (U  ,ne) (1  -  2P(K(X  )|x  )) 
N  n  u  n  n  n 


3=n+l 


+  2/^1  eX,(U  )f  (U  ,ne)  [P(5  w  j-n+l,K(X^)  |x^)-P(K(X^)  Ix^)  ) 
.  ,  n  N  n  u  n  n-i  n  n  n  n 

]=n+l 


C  6  I 

Let  Tj  denote  the  last  two  terms.  Noting  that  2P(K(x)  |  x) -1  =  g(x)  ,  we  can 
write  the  third  term  on  the  right  of  (4.3)  as 


(4.4)  -  v7b„(U^)f  (U^,ne)g(X^) 

N  n  u  n  ’  n 


eb„(U^)f  (U^,ne)i  (6)0^^ 
N  n  u  n  ^x  n 


+  o(e)  . 


Next,  we  simplify  the  T%  Write  b  (U^  )f  (U^  ,ne)  =  b  (U^) f  (U^,ne)  + 

i  N  n+1  u  n+1  N  n  u  n 

(b„(U^)f  (U^,ne))  (U^.,-U^)  +  0(c),  and  split  into  the  three  corresponding 
N  n  u  n  u  n+1  n  1  ^ 

sums:  Now  “  0(e^'^^)  =  o(g)  .  Noting  (3.5)f  combine 

+  T2  and  use  a  differentiability  argument  such  as  used  below  (3.6),  together 


20, 


with  the  fact  that  |x^.  -X^l  <  e,  to  get  that  T^.  +  =  o(e) .  The  above  sim- 

n+1  n  —  11  2 

pllflcatlons  of  (4.3)  yield 


(4.5)  , f^(ne+e) -f^(n  )  =  -eb„(U^)f  (U^,ne)g  (0)U^  +  o(e)  -  third  term  of  (4.2) 

n+1  0  0  Nnun  ’xn 


-  2/F-  y  (b„(U^)f  (U^  nE))  E^(U^^--U^)  [P(C'^, j-n,K(X^^,)  |x\,) 
^  N  n  u  n,  u  n  n+1  n  «  "-'■i 


j=n+l 


n+1  'n+1 


P(K(X^^,) |X^^,)1. 
n+1  n+1 


By  another  differentiability  argument  it  can  be  shown  that  if  the  in  the  P 

terms  of  (4.5)  are  replaced  by  0,  then  the  sum  changes  only  by  o(e).  Malce  this 
replacement  and  note  that  P(K(0) [0)  =  1/2.  The  revised  sum  in  (4.5)  is 

00 

(4.6)  e(b„(u'^)f  (U^,ne))  I  2E^  signIk(X^)+d]  [P(cS  j-n,K(0)  1 0)  -  h  +  o(e)  . 

Nnun  u.“-n  nn  n  ^ 

j=n+l 

Next,  it  is  not  hard  to  see  that  by  dropping  the  k(X^)  term  in  (4.6),  the 

sum  changes  only  by  o(e) .  To  see  this,  use  (Al)  and  the  fact  that  |u^|  ^  N+1 

and  also  (implied  by  (A5) )  E^  |  sign [k(6+»^U^) +C^1  -  sign (C^)  I  ■+•  0  as  e  -*■  0, 

uniformly  in  £  ,  and  in  X  G  [x, ,x  1.  Let  us  drop  this  k(X  )  term.  In  addition, 
n—  n  X,  u  n  - 

(A4)  and  (Al)  imply  that  if  the  E^  in  (4.6)  were  replaced  by  E®  (and  5^,  by 

n  nnn 

E  (0),  C  (0)),  then  the  sum  would  change  by  o(e)  only.  This  last  assertion 
n  n 

follows  from  the  observation  that,  after  the  k(X^)  is  dropped,  the  jth  summand  is 

actually  E^(sign  sign  C.(9).  Then  note  that 

n  n  n+i  j 

E®  sign  C^^O)  I  sign  C.(9)  -  E^  sign  \  E^^^^  sign  C.(0) 

j=n+l  ^  j=n+l  ^ 

=  E®f(L(6))  -  E^F(i^ 
n  n  n  n 
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where  by  (Al) ,  F(')  is  xmiformly  bounded  in  n,  e.  Now  use  (A4)  .  Then  we  can 
write 


(4.7)  (4.6)  =  o(e) 


*  I  E 

3=n+l 


n  ^n^®’ 


sign  ?j(0) 


Part  3 .  We  now  "average  out"  the  sum  in  (4.7) .  Define 


00  00 


f^(ne)  =  e(b  (U^)f  (U^,ne))  ^  \  (E®  sign  C.(0)  sign  C»(6) 

1  NNun  u.  .  n  T  X. 

3=n  £=3+1 


E®  sign  C.(0)  sign  5.(0)]. 
7 


Ihe  sum  is  well  defined  by  (A6) .  By  an  argument  similar  to  that  used  in  Part  2, 
we  can  show  that 


E*^f^(ne+e)-f^(ne)  =  -  (4.7)  +  o(e) 
n  1  1 


+  e(bjj(U^)f^(U^,ne))^  I  E  sign  Cq(0)  sign 


(0) 


Finally,  summing  up  the  calculations  and  cancelling  terms  whenever  possible  yields 


|fQ(ne)|  +  |r:^(ne)|  =0(»^), 


E^f^(ne+e)-f^(ne)  =  o(e)  +  ef.(U^,ne)  -  eb„(U^)f  (U^,ne)g  (0)U^ 
n  tn  Nnun  xn 


j=l 
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N 

Thus  (2.3)  -  (2.5)  hold  for  an  operator  A  which  equals  the  operator  A  of  (4,1) 
in  each  S„. 

N 

Tightness  of  {U^'^(*),  small  e}.  The  tightness  proof  of  Theorem  2  of  [7]  or  [111 

£ 

requires  the  construction  of  f  (ne)  similar  to  those  used  here  -  for  each  N.  But 
since  our  ff (ne)  are  0('^)  ,  the  conditions  of  the  cited  theorem  hold  and  we  get 
the  tightness  immediately  in  our  case.  Q.E.D. 

e  N 

Remarlc .  In  more  general  cases  the  f  '  are  chosen  in  a  similar  way: 

£  N 

f  '  =  f  +  small  perturbation.  First  we  obtain  an  expansion  (up  to  o(e))  of 

tY^*wn£+e)  -f(Y^'^,en).  Then  check  which  terms  in  the  expansion  need  to  be 
n  n+1  n 

averaged  out  -  or  replaced  by  an  "average  value".  These  will  be  the  terms  which 

do  not  depend  solely  on  Y  '  ,cn.  Then  the  sum  f.,'  is  introduced  (centered  about 

n  0 

a  "mean  value"  -  which  is  the  averaged  replacement  for  the  undesirable  term) . 

N 

Continue  as  in  the  proof,  building  up  the  operator  A  step  by  step. 
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